Learning from Data Streams with Concept Drift
نویسندگان
چکیده
Increasing access to incredibly large, nonstationary datasets and corresponding demands to analyse these data has led to the development of new online algorithms for performing machine learning on data streams. An important feature of real-world data streams is " concept drift, " whereby the distributions underlying the data can change arbitrarily over time. The presence of concept drift in a data stream causes many classical data mining techniques to become unsuitable, and therefore new approaches must be devloped in their place. In pursuit of this goal, we introduce the dynamic logistic regressor (DLR), a sequential Bayesian approach for performing binary classification on nonstationary data streams. We proceed to show how the DLR framework can be extended to cope with missing observations and missing and corrupted labels. We proceed to describe a new meta-algorithm for performing classification and regression on data streams with concept drift. The convex hull of receiver operating characteristic (ROC) curves has long been used for identifying potentially optimal classifiers. Unfortunately, the ROC curve does not perform as expected when learning from data streams exhibiting concept drift. We introduce a modification to the ROC curve that provides an easily maintainable online summary of a classifier's performance, even in the presence of concept drift. We similarly modify the recently introduced regression error characteristic (REC) curve, giving analogous dynamic summaries of online regressors. We then introduce a system for online ensemble learning utilizing these dynamic performance curves. Using the convex hulls of these curves, we develop a simple framework for supervised learning with drifting data streams. We present empirical evidence with real and simulated data that demonstrates that the proposed method performs better than selected previous solutions.
منابع مشابه
Detecting Concept Drift in Data Stream Using Semi-Supervised Classification
Data stream is a sequence of data generated from various information sources at a high speed and high volume. Classifying data streams faces the three challenges of unlimited length, online processing, and concept drift. In related research, to meet the challenge of unlimited stream length, commonly the stream is divided into fixed size windows or gradual forgetting is used. Concept drift refer...
متن کاملBoosting classifiers for drifting concepts
This paper proposes a boosting-like method to train a classifier ensemble from data streams. It naturally adapts to concept drift and allows to quantify the drift in terms of its base learners. The algorithm is empirically shown to outperform learning algorithms that ignore concept drift. It performs no worse than advanced adaptive time window and example selection strategies that store all the...
متن کاملAn Ensemble Classifier for Drifting Concepts
This paper proposes a boosting-like method to train a classifier ensemble from data streams. It naturally adapts to concept drift and allows to quantify the drift in terms of its base learners. The algorithm is empirically shown to outperform learning algorithms that ignore concept drift. It performs no worse than advanced adaptive time window and example selection strategies that store all the...
متن کاملDynamic Weighted Majority for Incremental Learning of Imbalanced Data Streams with Concept Drift
Concept drifts occurring in data streams will jeopardize the accuracy and stability of the online learning process. If the data stream is imbalanced, it will be even more challenging to detect and cure the concept drift. In the literature, these two problems have been intensively addressed separately, but have yet to be well studied when they occur together. In this paper, we propose a chunk-ba...
متن کاملLearning from Ontology Streams with Semantic Concept Drift
Data stream learning has been largely studied for extracting knowledge structures from continuous and rapid data records. In the semantic Web, data is interpreted in ontologies and its ordered sequence is represented as an ontology stream. Our work exploits the semantics of such streams to tackle the problem of concept drift i.e., unexpected changes in data distribution, causing most of models ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008